Skip to content

Conversation

@joaquim-verges
Copy link
Member

@joaquim-verges joaquim-verges commented Nov 21, 2025

Summary by CodeRabbit

Release Notes

  • Refactor
    • Optimized handling of failed transactions through efficient batch processing for improved system performance and reliability
    • Enhanced logging and diagnostic information for transaction processing operations

✏️ Tip: You can customize this high-level summary in your review settings.

@joaquim-verges joaquim-verges marked this pull request as ready for review November 21, 2025 08:40
@coderabbitai
Copy link

coderabbitai bot commented Nov 21, 2025

Walkthrough

A new batch-failure method is added to the atomic store to fail multiple pending transactions in a single Redis pipeline, reducing round-trips. The worker is refactored to collect non-retryable failures and invoke this batch method instead of failing transactions per-item, with added timing instrumentation around cleanup and budget tracking.

Changes

Cohort / File(s) Summary
Batch failure method
executors/src/eoa/store/atomic.rs
New public async method fail_pending_transactions_batch added to atomically fail multiple pending transactions in one Redis pipeline; performs batch ZREM on pending set, updates transaction hashes with completion metadata, queues webhook envelopes, and executes single pipeline (note: method appears added in two locations within the same file).
Worker refactoring for batch operations
executors/src/eoa/worker/send.rs
Refactored non-retryable failure handling to collect failures and invoke batch-failure method instead of per-item operations; added timing instrumentation around recycled nonce cleanup and separate timestamp for inflight budget calculation; changes applied to multiple control paths including preparation cleanup.

Sequence Diagram(s)

sequenceDiagram
    participant Worker as Worker (send.rs)
    participant Store as AtomicEoaExecutorStore
    participant Redis as Redis Pipeline
    
    rect rgb(200, 240, 255)
    Note over Worker,Redis: Old: Per-Item Failure
    Worker->>Store: fail_pending_transaction (Item 1)
    Store->>Redis: ZREM pending set + hash update + webhook queue + execute
    Worker->>Store: fail_pending_transaction (Item 2)
    Store->>Redis: ZREM pending set + hash update + webhook queue + execute
    end
    
    rect rgb(240, 200, 255)
    Note over Worker,Redis: New: Batch Failure
    Worker->>Worker: Collect non_retryable_failures: Vec<Item>
    Worker->>Store: fail_pending_transactions_batch (all items)
    Store->>Redis: ZREM all items + all hash updates + all webhooks queued
    Store->>Redis: Execute pipeline (single round-trip)
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Duplicate method insertion: The fail_pending_transactions_batch method is noted as being added in two locations within the same file—verify this is intentional or a merge conflict artifact.
  • Control flow changes: Multiple paths in send.rs refactored to collect and batch process failures; trace each path to ensure no non-retryable failures are missed or processed twice.
  • Timing instrumentation: New timestamp management for budget tracking and cleanup timing; confirm calculations are correct and don't introduce subtle timing bugs.
  • Webhook queueing in batch: Ensure webhook envelopes are correctly aggregated and their order/context is preserved across the batch.

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: adding batch processing for failed pending transactions in the EOA executor, which is reflected in both the atomic store additions and the worker's batching logic.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch Add_batch_processing_for_failed_pending_transactions_in_EOA_executor

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 2dbbab2 and 65d0d66.

📒 Files selected for processing (2)
  • executors/src/eoa/store/atomic.rs (1 hunks)
  • executors/src/eoa/worker/send.rs (3 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: d4mr
Repo: thirdweb-dev/engine-core PR: 5
File: executors/src/eoa/worker.rs:173-176
Timestamp: 2025-07-06T15:44:13.701Z
Learning: In the EOA executor system, aggressive lock acquisition is safe because every Redis state mutation uses WATCH operations on the lock key. If the lock is lost during a transaction, the WATCH causes the transaction to fail and the worker exits gracefully. This provides coordination between workers even when using forceful lock takeover.
📚 Learning: 2025-09-20T05:30:35.171Z
Learnt from: joaquim-verges
Repo: thirdweb-dev/engine-core PR: 48
File: executors/src/eoa/worker/send.rs:20-21
Timestamp: 2025-09-20T05:30:35.171Z
Learning: In executors/src/eoa/worker/send.rs, there is a critical bug where HEALTH_CHECK_INTERVAL is defined as 300 seconds but compared against millisecond timestamps, causing balance checks to occur every 300ms instead of every 5 minutes (1000x more frequent than intended).

Applied to files:

  • executors/src/eoa/worker/send.rs
📚 Learning: 2025-09-20T06:58:40.230Z
Learnt from: d4mr
Repo: thirdweb-dev/engine-core PR: 48
File: executors/src/eoa/store/submitted.rs:229-230
Timestamp: 2025-09-20T06:58:40.230Z
Learning: The diff in executors/src/eoa/store/submitted.rs shows correct brace structure - the first closing brace closes the remove_transaction_from_redis_submitted_zset method and the second closing brace closes the impl CleanSubmittedTransactions block. The change only adds whitespace formatting.

Applied to files:

  • executors/src/eoa/worker/send.rs
  • executors/src/eoa/store/atomic.rs
🧬 Code graph analysis (2)
executors/src/eoa/worker/send.rs (3)
executors/src/metrics.rs (2)
  • current_timestamp_ms (287-289)
  • calculate_duration_seconds (273-275)
executors/src/eoa/store/atomic.rs (2)
  • eoa (84-86)
  • worker_id (94-96)
executors/src/eoa/worker/error.rs (1)
  • is_retryable_preparation_error (250-281)
executors/src/eoa/store/atomic.rs (2)
executors/src/webhook/envelope.rs (3)
  • webhook_queue (128-128)
  • transaction_id (117-117)
  • transaction_id (121-123)
executors/src/webhook/mod.rs (1)
  • queue_webhook_envelopes (433-508)
🔇 Additional comments (4)
executors/src/eoa/worker/send.rs (3)

98-108: LGTM - Good observability improvement.

The timing instrumentation around recycled nonce cleanup provides valuable visibility into this phase of the send flow. The logging includes the count of remaining recycled nonces, which helps diagnose issues with nonce recycling.


111-116: LGTM - Improved timing accuracy.

Using a separate budget_start timestamp allows measuring the inflight budget acquisition duration independently from the total send flow duration. This improves the granularity and accuracy of performance monitoring.


320-374: LGTM - Efficient batch processing for non-retryable failures.

The refactor to collect non-retryable failures and process them in a single batch operation (lines 361-374) is a solid optimization that reduces Redis round-trips when multiple preparation failures occur simultaneously.

The error handling at lines 363-373 is appropriate: if batch processing fails, the error is logged with a clear message about potential stuck transactions, but processing continues. This prevents the worker from crashing while making the failure visible for investigation.

executors/src/eoa/store/atomic.rs (1)

628-704: LGTM - Well-implemented batch failure operation.

The batch failure method is correctly implemented and mirrors the single-transaction version at lines 571-626, ensuring consistency. Key strengths:

  1. Efficiency: Single ZREM operation (line 651) for all transaction IDs is the core optimization that reduces Redis round-trips.
  2. Atomicity: The entire batch is executed in a single pipeline (lines 689-692), ensuring all-or-nothing semantics.
  3. Error handling: Individual webhook queuing errors are logged (lines 679-683) but don't abort the batch, which is appropriate.
  4. Observability: The log at lines 694-701 provides clear visibility into batch operations.

The implementation correctly uses a shared transaction context (line 663) for all webhook jobs, ensuring they're queued as part of the same pipeline execution.


Comment @coderabbitai help to get the list of available commands and usage tips.

@joaquim-verges joaquim-verges merged commit a328989 into main Nov 21, 2025
3 of 4 checks passed
@joaquim-verges joaquim-verges deleted the Add_batch_processing_for_failed_pending_transactions_in_EOA_executor branch November 21, 2025 08:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants